-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Load government data in an asynchronous task #1522
Conversation
746d3e2
to
b89f071
Compare
|
||
module BulkData | ||
class Cache | ||
include Singleton |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I'm surprised you don't need to require "singleton"
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah Rails auto/eager loading takes care of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well done in getting this significant piece of work to this point! Have attempted to review, some of my questions/suggestions may be missing some context, do feel free to bat back on them.
|
||
describe ".wrote_at" do | ||
it "returns the time the entry was created" do | ||
time = 2.hours.ago.change(usec: 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A shame that the .change(usec: 0)
bit is needed. Why is that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a shame indeed. It's because travel_to does it to the time so we need it to match: https://github.com/rails/rails/blob/8bec77cc0f1fd47677a331a64f68c5918efd2ca9/activesupport/lib/active_support/testing/time_helpers.rb#L145
I could do it is an absolute time but I need it to be within the last 24 hours so it's tricky :-/
@@ -64,6 +64,7 @@ def and_a_publish_intent_is_created | |||
|
|||
def when_the_publishing_job_runs | |||
travel_to(Time.zone.parse("2019-6-22 9:00")) | |||
populate_default_government_bulk_data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the fact that you've had to make this change in 3 places suggests we should move the when_the_publishing_job_runs
implementation to somewhere shared 🤔 all three implementations are almost identical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ideally we'd just have less feature tests for this and more of the other conditions handled by requests specs, where this would not be a concern. It doesn't seem out of place to do this sort of thing a few times in our feature specs.
I was rather sad when I realised the effect of the travel_to on clearing the cache and that I had to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the effect of travel_to
on the cache?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cache bulk data for 24 hours. If we travel more than 24 hours after that time in Ruby code the cache will be invalidated. Here we jump from June 21st 0:00am to June 22nd 9:00am
<%= render partial: "errors/error", locals: { title: title, body_govspeak: body_govspeak } %> | ||
<% else %> | ||
<%= render partial: "errors/error", locals: t("errors.local_data_unavailable") %> | ||
<% end %> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole template is smashing 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks 😁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! 💯
The commit messages are great and make the code changes easy to follow, and the tests are really clear. Thanks for breaking up the commit history into smaller commits too, it made reviewing this much easier.
I only a have a few minor comments and things that could be clarified, but nothing major.
spec/requests/unwithdraw_spec.rb
Outdated
describe "POST /documents/:document/unwithdraw" do | ||
context "when the edition is in history mode" do | ||
let(:edition) do | ||
create(:edition, :withdrawn, :political, government: past_government) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why this is being defined twice now, rather than once at the beginning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now moot as I had to rebase against Peters changes which has outdated this.
I tend to go with trying to have lets near the code they're calling. I don't mind a bit of repetition if it's easy to find where the let is defined. Mainly because I find it difficult to read tests where let definitions occur a long way before a test.
spec/requests/withdraw_spec.rb
Outdated
describe "POST /documents/:document/withdraw" do | ||
context "when the edition is in history mode" do | ||
let(:edition) do | ||
create(:edition, :published, :political, government: past_government) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. Is there a reason for duplicating this setup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! 💯
The commit messages are great and make the code changes easy to follow, and the tests are really clear. Thanks for breaking up the commit history into smaller commits too, it made reviewing this much easier.
I only a have a few minor comments and things that could be clarified, but nothing major.
b89f071
to
40f7c2f
Compare
Bulk data is the namespace that we will be using for download large amounts of data periodically for use in Content Publisher. Some examples of candidates for this are topics, contacts and organisations. These are data entities where we need to download all of the Publishing API's data rather than a small sampling of on demand data. Previously we have loaded this data on demand and it risks periodic timeouts for the application. As this data can be loaded in numerous places it also makes it difficult to determine performance issues in the application as it may be a contributing factor to any other tasks we think may be slow. This module contains two common exceptions that we're anticipating to use. LocalDataUnavailableError is for situations when we're expecting data to be available in our cache but it is not present. RemoteDataUnavailableError is for when we try to load remote data but this fails.
40f7c2f
to
59f63b2
Compare
@leenagupte and @ChrisBAshton thanks for the reviews. I've addressed the comments. |
This class is intended to be the entry point for writing any BulkData. It uses the Rails Redis cache as an underlying store. Using Redis was a design choice for the following reasons: - Redis offers caching across multiple GOV.UK boxes, so one write will cover all machines - Redis is fast to read and write from And using the Rails cache implementation offered the following advantages: - Can be configured with [middleware][1] to avoid need to hit Redis multiple times in a request - Redis namespacing - Automatic serialisation/deserialisation An important concern is that we need to ignore the guidance Rails suggests to use a LRU redis cache, as we don't want to have keys automatically cleared out of redis. Instead we just want them to expire naturally. The methods on this class allow adding an item to the cache, removing an item, removing all items. On top of this there are time based methods which are intended to be used to determine when an item was added to cache. This is to enable checking how old something is when deciding whether to add something or not. This is a feature Redis doesn't have by default so it's quite simple to just add this a different entry, the Rails cache entry actually has some info about created time, but I have no idea how to access it and also think it's probably nicer to not have to pull out the whole cache entry to check it. [1]: https://github.com/rails/rails/blob/9926fd86626690746c7ee78174c4d3577bdfa58d/activesupport/lib/active_support/cache/redis_cache_store.rb#L44
This sets up this model to work with data from the Publishing API. It no longer has class methods to look up the model as these are going to be on a different class. The input we expect from the Publishing API is a hash of base_path, content_id, details, locale and title. We can look up started_on, ended_on and current dates from the details hash.
This allows classes that mix this module in to be serialised to a hash. This has been added so that instances that use this module can be serialised to fake cache data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have skimmed through all the commits again, looks good to me 👍
59f63b2
to
9249b84
Compare
The purpose of this class is to mediate communication between Publishing API and our cache adapter for governments. It writes entries to the cache using a cache key that is defined as a constant in this class. This is named as "government-v1" to indicate that this is version 1 of the data we are caching. If we are to make any backwards incompatible changes to the data we expect (such as adding an extra mandatory field) then we will need to increment this number. This will ensure that when the code change is deployed the old cache will not be used anymore and will expire naturally. It provides a number of methods to finding a particular government and a method to populate the cache.
This clears the BulkData::Cache after every spec and helpers to populate the cache before tests. Default governments are set up for the most common case where we just need to know about the current and previous governments. It also provides helper methods for current and past government as these are likely to be needed quite frequently.
This creates instances of the GovernmentRepository class whenever we need to load governments in the context of an edition. The factory has been updated to reflect that governments can't be determined without an external dependency and thus governments are always injected. This does raise the prospect of some potential performance concerns for the amount of classes initialised if we are frequently looking up government information. Mostly though I expect these things to not occur multiple times in the same request so I don't think it'll cause any actual performance issues.
This removes unnecessary instance variables and reduces long lines, it also removes an unnecessary test expectation.
By using I18n.t we wouldn't be raising errors if a translation was missing. We don't actually need to call I18n directly as Rails has a view helper of t which roughly correlates to I18n.t!
This field is referred to as a title in the Publishing API
This sets up some default governments in advance of tests so that the government data is populated. These are set-up before all feature and request tests.
By jumping more than 24 hours into the future the cache entries for governments expire, thus the government data needs to be re-stubbed.
As we no longer have past_government and current_government traits this injects these object in instead.
This cuts back on the boilerplate needed to run a worker job on a single box and deny another boxes that try and run this at the same time. This is a situation that occurs with the sidekiq scheduler where every box with a sidekiq process tries to run the job at the same time and the first one to start it wins. We counter this with a PostgreSQL advisory lock which prevents any other clients using the db while this lock is held (they gracefully exit). Ideally we want to run this lock inside a transaction so that if anything goes wrong the lock can be automatically closed, whereas if we run it outside a transaction and something crashes we may need to manually release the lock.
It was more common that we were using Sidekiq::Testing.fake! than relying on Sidekiq::Testing.inline! thus switching to this saves us code. This is also a preferred approach to take since it keeps us more honest by having to explicitly tell Sidekiq to perform actions rather than expecting them to just happen synchronously.
The first_published_at value should always be set on documents that have a live edition. I didn't know a nice way to specify an optional trait to the association method so have ended up with this array monstrosity. If you know a better technique feel free to change it.
This moves the it block outside of a context to be before context as per the more common convention. It also removes the government and political tests from their draft and live context since they aren't resultant effects of that particular context. Finally it makes the draft and live tests more consistent with the other edition state tests by creating an edition rather than a document by factory bot.
This sets up a job that is run every 15 minutes to update the bulk data. If accessing the data fails this job is retried. The schedule is set with an interval of 15 minutes and a delay of 0s which makes the job start immediately whenever the worker process is restarted. As this job is run on a schedule it will execute on all machines at the same time. To counter this the job is run exclusively using the method on the parent class.
This will schedule a background job to load the data if for some reason it's not available in the current environment and a service has tried to use it. There is a risk that this could cause a spike in identical queued jobs were a bunch of users to get this at the same time. This shouldn't present much of a problem as the job only executes one at a time and won't do anything if the data was updated in the last 5 minutes.
If the application is ever in a state where there is not local data unavailable it will return a 503 response and a simple error message. As I anticipate that developers may often be people who see this message (at least while sidekiq isn't run by default with the app) I thought it was useful to have explicit instructions in a developer environment. It was pretty hard to think of the best way to test this so I went with this rather contrived example since documents_show is one of the main routes. I expect that when the BulkData technique is used as part of index page filters we can replace this test with a less contrived one.
This is no longer needed given we are now using the Publishing API
This spec relies on a government existing at the date we specify which caused this test to fail. By creating the government in the test we resolve this complication.
This sets up a local cache that wraps around the Redis cache layer of BulkData::Cache during the duration of a request. In order for this to work in a development environment we need the BulkData::Cache class not to be reloaded automatically. I achieved this by doing direct require calls. I first tried using the Rails autoload_once_paths option but didn't have much luck. As an example, before fixed require calls: ``` irb(main):002:0> BulkData::Cache.object_id => 70342543814920 irb(main):003:0> reload! Reloading... => true irb(main):004:0> BulkData::Cache.object_id => 70342595490220 ``` and after: ``` irb(main):002:0> BulkData::Cache.object_id => 70249028613320 irb(main):004:0> reload! Reloading... => true irb(main):005:0> BulkData::Cache.object_id => 70249028613320 ```
By default the RedisCacheStore uses a fault tolerant approach where it returns nil if it can't access Redis and writes to a logger that we haven't configured [1]. This isn't an ideal configuration for our use case as it's a critical problem if the cache isn't available. To try resolve this we've got 3 reconnect attempts available if we don't manage to connect to Redis or have lost our connection. In the event that we have an error reading from the cache this will report the error to sentry, which will hopefully help make problems easier to diagnose. [1]: https://github.com/rails/rails/blob/09a2979f75c51afb797dd60261a8930f84144af8/activesupport/lib/active_support/cache/redis_cache_store.rb#L472-L485
There's not a whole lot we can do when a dependent service is erroring so lets not report it and wait to see if it causes problems in the app. This is to resolve an issue where we start getting server errors overnight in integration at the point of the data sync.
9249b84
to
d395aa5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing the comments.
LGTM 👍 🥇
Since alphagov/content-publisher#1522 was introduced Content Publisher has required a worker to operate normally. Because of this it is now added as an app dependency.
Trello: https://trello.com/c/TQ6GPOHU/1228-spike-into-loading-governments-from-publishing-api
This introduces a new system, called BulkData, which is used to load data asynchronously for use by Content Publisher. This system is intended to be used in situations where we need to load a large amount of data that we would cache, loading this in can cause timeouts at various places across the app and is thus undesirable. The system to replace this uses a regular ActiveJob that updates the data every 15 minutes, moving the performance impacts to be a background concern. This builds on the work identified in #1021 and the first step towards implementing it to resolve various timeout problems.
What this PR does specifically is start an ActiveJob whenever the application is deployed and also ran every 15 minutes. This job calls the Publishing API to access all governments and stores the response in Redis for 24 hours.
Whenever Content Publisher needs to look up governments it will query Redis for this data. If the data is unavailable it will schedule the job to populate the data and throw an exception. Users will be a shown an error until this data is re-populated.
There's a likelihood users may see the error a reasonable amount in development environments if they are not used to starting the worker job with the app. As a follow up to this we should update govuk-docker to start content-publisher with a worker.
Screenshots of error pages:
Production / Test
Development
Further information in the commits